This tutorial provides a brief introduction into mapping data using R. My PhD work focuses on regional dialects, which means we’ll be working with regional data today. We want to map features to help us understand regional distribution of language varieties, but we could be interested in many other features that are distributed regionally, for example social variables like income or access to education. Mapping these things can be useful for linguistics, but it is quite well known in areas like geography or ecology as well.
In preparation for our maps, we’ll need to load a couple of packages. For mapping we’ll need ‘maps’ and for some optional maps ‘rworldmap’, which also loads ‘sp’. Depending on your version of R you might also need ‘broom’. The main work for the maps will be done using ggplot, so we need the tidyverse package as well.
library(maps) # to get US maps
library(rnaturalearth) # mapping other country outlines
library(rnaturalearthhires) # high resolution
library(tidyverse) # making pretty maps
library(sf) # to change the geo-information to suitable format
The data we’ll be using today is based on a collection of 1 billion Tweets / 9 billion words. All Tweets are geocoded American Tweets collected between 2013 and 2014. From this a US Twitter swearing data set was compiled. See Huang et al. 2016; and Grieve et al. 2017 for more information.
The initial step now is reading in the data set, which is located in my Github.
norm_swear <- read.table("https://raw.githubusercontent.com/danaroemling/mapping/main/r_ladies_april23/MAPPING_SWEARING.csv",
header = TRUE, sep = ",")
The basic dimension of the data set is 52 swear words measured across 3,076 locations, denoted by state plus county (= 53).
dim(norm_swear)
## [1] 3076 53
The locations are coded as state-county pairs. These are the first 10 rows of our data set.
head(norm_swear, 10)
## county ass asshole bastard bitch bitched bitchy bloody
## 1 alabama,autauga 1520421 49600 9538 962106 6995 8903 5087
## 2 alabama,baldwin 1246775 54318 6578 807348 2334 7851 14004
## 3 alabama,barbour 2263661 29188 3243 959948 3243 6486 3243
## 4 alabama,bibb 1451192 14629 2926 1009398 0 8777 0
## 5 alabama,blount 559433 72969 4230 506556 2115 5288 3173
## 6 alabama,bullock 2168413 56605 0 1184354 0 8708 0
## 7 alabama,butler 2638306 38680 11282 1806683 6447 4835 3223
## 8 alabama,calhoun 1604872 38763 8012 917534 2166 5197 4115
## 9 alabama,chambers 1881425 34756 5902 1438120 1312 1967 20329
## 10 alabama,cherokee 380377 37028 1683 272660 1683 6732 5049
## bullshit cock crap crappy cunt damn damnit damned darn dick
## 1 120184 15897 146255 13354 22892 1206925 19077 8903 13990 210481
## 2 98452 7002 109910 10397 9124 907073 9760 10185 10185 113729
## 3 74591 3243 113507 19458 3243 1258310 0 25945 22701 136209
## 4 105328 0 90699 8777 2926 1176168 17555 2926 17555 93625
## 5 101523 9518 201988 6345 9518 469543 13748 23266 8460 59222
## 6 182878 8708 21771 4354 0 1240960 0 0 4354 300443
## 7 164390 9670 46738 4835 0 1513359 3223 11282 14505 267537
## 8 95500 8446 74711 6713 10395 1027543 14292 14292 12344 147689
## 9 155419 7869 55741 3935 8525 1080065 13116 8525 10492 172469
## 10 40394 1683 121182 15148 3366 336617 5049 11782 6732 38711
## dickhead douche douchebag dumbass dyke fag faggot fatass freaking friggin
## 1 3179 14626 6359 43241 2544 42605 40697 6359 167876 2544
## 2 2971 18884 5729 29069 2546 19521 15489 4031 170593 2546
## 3 0 3243 3243 22701 0 9729 0 0 175126 3243
## 4 2926 5852 0 11703 2926 2926 8777 0 187251 8777
## 5 9518 13748 4230 25381 0 16920 32783 4230 195643 5288
## 6 0 4354 0 52251 17417 13063 52251 0 47897 0
## 7 0 4835 1612 29010 0 6447 12893 1612 78972 1612
## 8 650 7363 2815 18840 3898 11044 9312 3032 104378 2599
## 9 1967 656 0 26231 0 9181 9837 8525 127221 1312
## 10 1683 6732 1683 15148 0 1683 10099 0 107717 13465
## fuck fucked fucker fuckery fucking goddamn gosh hell hoe homo
## 1 1441570 212388 21620 6359 592017 5087 69948 695667 268347 17169
## 2 1137714 139191 15065 2546 462767 5941 81690 573101 252920 7426
## 3 1115615 158910 12972 3243 376196 9729 64861 901573 369710 3243
## 4 1351715 236989 20481 17555 833850 8777 38035 506162 298431 2926
## 5 775168 88832 12690 3173 319374 4230 132191 379653 131134 1058
## 6 1941993 278672 21771 8708 574760 13063 30480 735867 335277 0
## 7 2427177 328781 17728 14505 676902 27398 48350 862244 515735 1612
## 8 1305163 188184 12127 8879 401056 7363 52189 747323 293645 6063
## 9 1800109 220997 17706 5246 445273 15739 58364 908252 398713 3935
## 10 464532 50493 33662 5049 323152 1683 42077 373645 69006 5049
## jackass motherfucker motherfucking nigger piss pissed pissy pussy shit
## 1 13354 12082 3179 5087 69948 169148 11446 197763 2352169
## 2 4668 4456 2546 2971 71081 152770 4456 103969 1733094
## 3 0 19458 3243 3243 64861 139452 6486 204313 2085293
## 4 0 5852 5852 0 67293 201880 0 152141 2390371
## 5 2115 7403 3173 0 102580 155457 10575 32783 905244
## 6 4354 30480 0 13063 65314 117565 4354 296089 3239557
## 7 0 9670 4835 0 78972 262702 11282 269149 3932477
## 8 1083 7579 7146 3032 57170 159816 4331 161332 2190864
## 9 3279 21641 11148 1967 53774 172469 2623 255097 2863124
## 10 10099 0 0 0 42077 72373 1683 26929 540270
## shittiest shitty slut slutty twat whore
## 1 1908 52779 37518 3179 3179 46420
## 2 3395 41163 31615 5305 3819 32676
## 3 9729 22701 32431 0 0 25945
## 4 0 20481 29258 0 0 29258
## 5 2115 43359 35956 4230 2115 45474
## 6 4354 65314 21771 4354 0 13063
## 7 0 35457 19340 6447 4835 24175
## 8 866 25337 19273 3248 1083 20573
## 9 656 8525 16394 1967 1312 28198
## 10 1683 18514 10099 1683 6732 38711
For each county, Grieve et al. (2017) measured the relative frequency per billion words of the word in all the Tweets originating from that county by dividing the frequency of that word in those Tweets by the total number of words in those Tweets and multiplying the product by 1 billion. These swear words are all in the top 10,000 most frequent word types in the corpus. Here is a summary of the swear words.
summary(norm_swear[, 2:ncol(norm_swear)])
## ass asshole bastard bitch
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 633648 1st Qu.: 42586 1st Qu.: 4957 1st Qu.: 522379
## Median : 861798 Median : 63874 Median : 9382 Median : 727158
## Mean :1017504 Mean : 67861 Mean : 11108 Mean : 790053
## 3rd Qu.:1265534 3rd Qu.: 86399 3rd Qu.: 13992 3rd Qu.: 996426
## Max. :8904228 Max. :567215 Max. :310376 Max. :7340226
## bitched bitchy bloody bullshit
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 2677 1st Qu.: 3595 1st Qu.: 84135
## Median : 3382 Median : 6802 Median : 8779 Median :111614
## Mean : 4898 Mean : 8686 Mean : 11871 Mean :113866
## 3rd Qu.: 6328 3rd Qu.: 11153 3rd Qu.: 14800 3rd Qu.:139169
## Max. :508411 Max. :283607 Max. :591876 Max. :714967
## cock crap crappy cunt
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 5280 1st Qu.: 60728 1st Qu.: 4996 1st Qu.: 7365
## Median : 11406 Median : 88146 Median : 9905 Median : 17118
## Mean : 14392 Mean : 98504 Mean : 11993 Mean : 21028
## 3rd Qu.: 17621 3rd Qu.:124366 3rd Qu.: 14921 3rd Qu.: 28560
## Max. :1242999 Max. :821355 Max. :244499 Max. :435954
## damn damnit damned darn
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 578292 1st Qu.: 6902 1st Qu.: 4680 1st Qu.: 8764
## Median : 742200 Median : 14019 Median : 9002 Median : 13997
## Mean : 794290 Mean : 17008 Mean : 11536 Mean : 17030
## 3rd Qu.: 944634 3rd Qu.: 22584 3rd Qu.: 14364 3rd Qu.: 20611
## Max. :3846951 Max. :368363 Max. :235349 Max. :263481
## dick dickhead douche douchebag
## Min. : 0 Min. : 0.0 Min. : 0 Min. : 0
## 1st Qu.: 106660 1st Qu.: 0.0 1st Qu.: 10964 1st Qu.: 0
## Median : 152443 Median : 828.5 Median : 20833 Median : 4616
## Mean : 158922 Mean : 2402.5 Mean : 28204 Mean : 7056
## 3rd Qu.: 199590 3rd Qu.: 2873.2 3rd Qu.: 37840 3rd Qu.: 9268
## Max. :1426300 Max. :65772.0 Max. :357483 Max. :275330
## dumbass dyke fag faggot
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 16770 1st Qu.: 0 1st Qu.: 9904 1st Qu.: 9832
## Median : 25860 Median : 1272 Median : 18970 Median : 20571
## Mean : 28604 Mean : 2749 Mean : 22751 Mean : 25161
## 3rd Qu.: 35503 3rd Qu.: 3755 3rd Qu.: 30006 3rd Qu.: 34156
## Max. :301841 Max. :133627 Max. :301341 Max. :308339
## fatass freaking friggin fuck
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 83326 1st Qu.: 0 1st Qu.: 982700
## Median : 2746 Median :118184 Median : 3146 Median :1392849
## Mean : 3993 Mean :129884 Mean : 5199 Mean :1429113
## 3rd Qu.: 5230 3rd Qu.:162076 3rd Qu.: 6042 3rd Qu.:1835837
## Max. :157093 Max. :900328 Max. :307630 Max. :9527509
## fucked fucker fuckery fucking
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 116515 1st Qu.: 12694 1st Qu.: 0 1st Qu.: 497703
## Median : 172967 Median : 21610 Median : 1760 Median : 724440
## Mean : 177169 Mean : 25477 Mean : 3783 Mean : 771299
## 3rd Qu.: 231936 3rd Qu.: 32686 3rd Qu.: 5170 3rd Qu.: 991083
## Max. :1133503 Max. :261505 Max. :277937 Max. :4075971
## goddamn gosh hell hoe
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 2482 1st Qu.: 47687 1st Qu.: 407059 1st Qu.: 64499
## Median : 10120 Median : 72236 Median : 498528 Median : 110291
## Mean : 12649 Mean : 82527 Mean : 531404 Mean : 155724
## 3rd Qu.: 17096 3rd Qu.: 103684 3rd Qu.: 614383 3rd Qu.: 200468
## Max. :231535 Max. :2601908 Max. :2770083 Max. :1949566
## homo jackass motherfucker motherfucking
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 4362 1st Qu.: 0
## Median : 6561 Median : 4375 Median : 10130 Median : 3643
## Mean : 7823 Mean : 5626 Mean : 11719 Mean : 4905
## 3rd Qu.: 10304 3rd Qu.: 7211 3rd Qu.: 15396 3rd Qu.: 6450
## Max. :276932 Max. :154447 Max. :382482 Max. :236967
## nigger piss pissed pissy
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 55314 1st Qu.:125086 1st Qu.: 0
## Median : 3022 Median : 71656 Median :160544 Median : 4476
## Mean : 4694 Mean : 75528 Mean :168064 Mean : 7008
## 3rd Qu.: 6185 3rd Qu.: 91372 3rd Qu.:204570 3rd Qu.: 8866
## Max. :295300 Max. :475602 Max. :747938 Max. :293600
## pussy shit shittiest shitty
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 59199 1st Qu.: 1207753 1st Qu.: 0 1st Qu.: 41925
## Median : 98144 Median : 1608105 Median : 2910 Median : 70316
## Mean : 120961 Mean : 1753575 Mean : 4034 Mean : 76213
## 3rd Qu.: 154132 3rd Qu.: 2172223 3rd Qu.: 5878 3rd Qu.:102168
## Max. :1488628 Max. :12309084 Max. :69425 Max. :550661
## slut slutty twat whore
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 24016 1st Qu.: 0 1st Qu.: 0 1st Qu.: 25470
## Median : 38572 Median : 4735 Median : 3116 Median : 37280
## Mean : 43301 Mean : 5727 Mean : 4640 Mean : 41505
## 3rd Qu.: 55780 3rd Qu.: 7934 3rd Qu.: 6112 3rd Qu.: 52613
## Max. :547945 Max. :150670 Max. :130014 Max. :547945
Before we map our swearing data, we need to understand the basics of cartography in R.
First, we need to get a map of the US, which we will format and use as a base to plot our swear word relative frequencies on to. There are several stages to setting up a nice map. Aside from the first step though, which just involves reading in the underlying map, they’re all optional.
First, we need to get a US map. Fortunately working with US data is very easy in R, since all the necessary maps can be accessed either in library(maps) or through ggplot2. We use ggplot’s map_data function to extract the relevant information from the package.
usa <- map_data("usa")
Now we’ll have a look at the very basic US map. For this we’ll need ggplot2. For any ggplot to work, we need three basic components: data, aesthetic, and geoms. The data is the resource we use and want to visualise. The geometric objects (stuff like points, shapes, lines etc.) is the format in which the data gets put. We need the aesthetics to link between the data and the geometric objects (geoms), so that R knows how the data is visualised.
ggplot() + geom_polygon(data = usa, aes(x = long, y = lat, group = region))
If you want to map other countries, you can download and read in the base mapping data (e.g. shapefiles), which are available from various different sources. This is especially interesting if you’re looking to work with administrative regions and the like. For country outlines, you can also use library(rnaturalearth) or library(rworldmap). This example below shows how to produce a map of Germany, Austria and Switzerland (= German-Speaking Area, GSA).
What this code chunk does is getting the world map and then creating a list of three countries by name. Then we create a map based on that list and in the next step we get the coordinates of those countries, so that we can use these for mapping.
gsa_outline <- ne_countries(country = c("Austria", "Germany", "Switzerland"), returnclass = "sf",
scale = "large")
After this, we can have a look at our three countries using ggplot. The coord_fixed argument makes sure that the relationship between x and y is correct; it fixes the aspect ratio.
gsa <- ggplot(data = gsa_outline) + geom_sf()
gsa
Now, we need to make sure our US data can be mapped, which means we don’t just need the outline of the US, but we need the counties. We can extract them from our maps package.
counties <- map_data("county")
ggplot() +
geom_polygon(data = counties,
aes(x = long, y = lat, group = group),
# to see the counties we add a colour for outline and filling
color = "black", fill = "lightgrey",
linewidth = .1 )
Now that we have a basic map of the US, we can make it look a bit nicer, so that subsequent maps are easier to read.
ggplot() +
geom_polygon(data = counties,
aes(x = long, y = lat, group = group),
color = "black", fill = "white",
size = .1 ) +
theme_minimal() + # sets the theme for the plot
ggtitle("US Map with Counties") + # gives the plot a title
theme(axis.title.x = element_blank(), # removes x axis title, here longitude
axis.title.y = element_blank(),# removes y axis title, here latitude
axis.text.x = element_blank(), # removes x axis text, here coordinates
axis.text.y = element_blank(), # removes y axis text, here coordinates
panel.grid.major = element_blank(), # removes grid lines
panel.grid.minor = element_blank(), # removes grid lines
plot.title = element_text(hjust = 0.5)) # centres title
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## Warning: Please use `linewidth` instead.
Now that we have a base map and our data read in, we need to make sure the data can be mapped. This might look a bit complicated, but what we’re doing is getting the coordinate data that we need to join our existing data set.
First, we get a map of the counties (aka the geo-information we need) and save it as us_geo (and have a little look). For this we need the package ‘sf’. We’re still using the same “maps” library as before, but since each county has multiple sets of coordinates, we need a format that can be matched to our data set, where each location is just one row, hence we’re handling it with ‘sf’. We merge the two separate lists into one using dplyr.
us_geo <- st_as_sf(maps::map(database = "county", plot = FALSE, fill = TRUE))
ggplot(data = us_geo) + geom_sf()
us_geo_swear <- us_geo %>%
left_join(norm_swear, by = c(ID = "county"))
If you have a look at the new data frame us_geo_swear, you can see that it is essentially the same list as before, but that the last column contains another list, as every county has multiple coordinate points, which we need for plotting.
# shows us that it is a data frame
class(us_geo_swear)
## [1] "sf" "data.frame"
# you can see that we now have a data frame that contains multipolygons
head(us_geo_swear)
## Simple feature collection with 6 features and 53 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS: WGS 84
## ID ass asshole bastard bitch bitched bitchy bloody
## 1 alabama,autauga 1520421 49600 9538 962106 6995 8903 5087
## 2 alabama,baldwin 1246775 54318 6578 807348 2334 7851 14004
## 3 alabama,barbour 2263661 29188 3243 959948 3243 6486 3243
## 4 alabama,bibb 1451192 14629 2926 1009398 0 8777 0
## 5 alabama,blount 559433 72969 4230 506556 2115 5288 3173
## 6 alabama,bullock 2168413 56605 0 1184354 0 8708 0
## bullshit cock crap crappy cunt damn damnit damned darn dick
## 1 120184 15897 146255 13354 22892 1206925 19077 8903 13990 210481
## 2 98452 7002 109910 10397 9124 907073 9760 10185 10185 113729
## 3 74591 3243 113507 19458 3243 1258310 0 25945 22701 136209
## 4 105328 0 90699 8777 2926 1176168 17555 2926 17555 93625
## 5 101523 9518 201988 6345 9518 469543 13748 23266 8460 59222
## 6 182878 8708 21771 4354 0 1240960 0 0 4354 300443
## dickhead douche douchebag dumbass dyke fag faggot fatass freaking friggin
## 1 3179 14626 6359 43241 2544 42605 40697 6359 167876 2544
## 2 2971 18884 5729 29069 2546 19521 15489 4031 170593 2546
## 3 0 3243 3243 22701 0 9729 0 0 175126 3243
## 4 2926 5852 0 11703 2926 2926 8777 0 187251 8777
## 5 9518 13748 4230 25381 0 16920 32783 4230 195643 5288
## 6 0 4354 0 52251 17417 13063 52251 0 47897 0
## fuck fucked fucker fuckery fucking goddamn gosh hell hoe homo
## 1 1441570 212388 21620 6359 592017 5087 69948 695667 268347 17169
## 2 1137714 139191 15065 2546 462767 5941 81690 573101 252920 7426
## 3 1115615 158910 12972 3243 376196 9729 64861 901573 369710 3243
## 4 1351715 236989 20481 17555 833850 8777 38035 506162 298431 2926
## 5 775168 88832 12690 3173 319374 4230 132191 379653 131134 1058
## 6 1941993 278672 21771 8708 574760 13063 30480 735867 335277 0
## jackass motherfucker motherfucking nigger piss pissed pissy pussy shit
## 1 13354 12082 3179 5087 69948 169148 11446 197763 2352169
## 2 4668 4456 2546 2971 71081 152770 4456 103969 1733094
## 3 0 19458 3243 3243 64861 139452 6486 204313 2085293
## 4 0 5852 5852 0 67293 201880 0 152141 2390371
## 5 2115 7403 3173 0 102580 155457 10575 32783 905244
## 6 4354 30480 0 13063 65314 117565 4354 296089 3239557
## shittiest shitty slut slutty twat whore geom
## 1 1908 52779 37518 3179 3179 46420 MULTIPOLYGON (((-86.50517 3...
## 2 3395 41163 31615 5305 3819 32676 MULTIPOLYGON (((-87.93757 3...
## 3 9729 22701 32431 0 0 25945 MULTIPOLYGON (((-85.42801 3...
## 4 0 20481 29258 0 0 29258 MULTIPOLYGON (((-87.02083 3...
## 5 2115 43359 35956 4230 2115 45474 MULTIPOLYGON (((-86.9578 33...
## 6 4354 65314 21771 4354 0 13063 MULTIPOLYGON (((-85.66866 3...
# If you open the data frame and scroll to the last column, you can see the
# list in the list.
view(us_geo_swear)
Now that the data is prepared, we can try and map some swear words. Note that we’ve added geom_sf to the plot. We do this because it can handle the sf data we’ve added for the geolocation of our swear words. That also means we don’t need geom_polygon, but by the name you can tell it has similar functionality.
This first map is a very basic choropleth map based on our variable “ass”:
ggplot() + geom_sf(data = us_geo_swear, aes(fill = ass))
Let’s add our design to it:
ggplot() +
geom_sf(data = us_geo_swear,
aes(fill = ass)) +
theme_minimal() +
coord_sf(crs = "ESRI:102003") + # this sets the projection for the map, which is Albers
ggtitle("'Ass' Distribution in the US per County") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
That looks sort of like what we want, so let’s rework it a bit. Note that we divide the occurrences of ‘ass’ by 10,000, since we’re dealing with high numbers we can thus make our graph easier to read this way. We had billions before, now we are dealing with hundred thousands. So instead of, for example, 49600 occurrences of asshole per billion words in Alabama, Autauga, we know use 4.96 occurrences per 100,000 words.
ggplot() +
geom_sf(data = us_geo_swear,
aes(fill = ass / 10000),
lwd = 0.1, # lwd sets the outline thickness of the polygons
color = "grey") + # this sets the outline colour
theme_minimal() +
coord_sf(crs = "ESRI:102003") +
ggtitle("'Ass' Distribution in the US per County") +# this adds a new legend title with line break \n
guides(fill = guide_legend(title = "Distribution")) + # here we start using some nicer colours
scale_fill_continuous(low = "white",
high = "mediumpurple4") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8))
We can see that there seems to be a trend towards ass in the Southeast. Let’s see if we can see some more trends.
ggplot() + geom_sf(data = us_geo_swear, aes(fill = dickhead/10000), lwd = 0.1, color = "grey") +
theme_minimal() + coord_sf(crs = "ESRI:102003") + ggtitle("'Dickhead' Distribution in the US per County") +
guides(fill = guide_legend(title = "Distribution")) + scale_fill_continuous(low = "white",
high = "mediumpurple4") + theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5), legend.title = element_text(size = 8))
How about fuck, but in green?
ggplot() +
geom_sf(data = us_geo_swear,
aes(fill = fuck / 10000),
lwd = 0.1,
color = "grey") +
theme_minimal() +
coord_sf(crs = "ESRI:102003") +
ggtitle("'Fuck' Distribution in the US per County") +
guides(fill = guide_legend(title = "Distribution")) +
scale_fill_continuous(low = "white",
high = "aquamarine4") + # green this time?
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title= element_text(size = 8))
You can see that we are able to produce nice looking maps, but we can’t really see the distribution of the feature well. So we introduce a concept called class intervals, in our case quantiles.
In the next step for the swearing maps we’ll implement quantiles. What that means is we split the relative frequency distribution for the word we want to map into intervals. We’re using “quantile” style intervals here, where the values are split so each interval contains a roughly equal number of values, although the range of each interval will likely vary (often considerably).
In order to do this we’ll first pick a swear word, it’s location and create a new list. Then we’ll calculate the quantiles for our swear word and add this as a factor to our list. Exchange the swear word in this code to run it with a different one.
# select the columns you need
quant_swear <- us_geo_swear %>%
select(fuck, geom)
# calculate quantiles
q <- quantile(quant_swear$fuck, na.rm = TRUE)
# add factor given the quantiles to our list
quant_swear$quant <- factor(findInterval(quant_swear$fuck, q))
Now we can map our data. Instead of filling the polygons by the frequency of our swear word, we use the quantiles we’ve just defined. Note that that means we’re going from continuous scale colours to discrete, so we need to change the colouring option of our map. That’s why we first define these colours.
cols <- c("1" = "white",
"2" = "lightsteelblue1",
"3" = "lightsteelblue2",
"4" = "lightsteelblue3",
"5" = "lightsteelblue4")
ggplot() +
# we've added na.omit to not have NAs plotted
geom_sf(data = na.omit(quant_swear),
aes(fill = quant),
lwd = 0.1,
color = "grey") +
# here we pass our colour list
scale_colour_manual(values = cols,
#and say we use it to fill
aesthetics = c("colour", "fill")) +
theme_minimal() +
coord_sf(crs = "ESRI:102003") +
ggtitle("'Fuck' Quantile Distribution in the US") +
guides(fill = guide_legend(title = "Quantiles")) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8))
Let’s map the quantiles of another swear word and change the colours for the map. If you want to play around with colour yourself, this website offers a good overview.
quant_swear <- us_geo_swear %>%
select(shit, geom)
q <- quantile(quant_swear$shit, na.rm = TRUE)
quant_swear$quant <- factor(findInterval(quant_swear$shit, q))
cols <- c(`1` = "white", `2` = "rosybrown1", `3` = "rosybrown2", `4` = "rosybrown3",
`5` = "rosybrown4")
ggplot() + geom_sf(data = na.omit(quant_swear), aes(fill = quant), lwd = 0.1, color = "grey") +
scale_colour_manual(values = cols, aesthetics = c("colour", "fill")) + theme_minimal() +
coord_sf(crs = "ESRI:102003") + ggtitle("'Shit' Quantile Distribution in the US") +
guides(fill = guide_legend(title = "Quantiles")) + theme(axis.title.x = element_blank(),
axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8))
If we still have time, we will create a different looking map and add some cities, so you can see a different example of making maps.
As the last bit, we’ll try out adding another layer to our ggplot maps. Remember our map for the German-speaking area.
gsa
If we wanted to add cities to this, because we’re interested in looking at a city level population, we can do this by using geom_point. Let’s first load some data to do this.
gsa_data <- read.table("https://raw.githubusercontent.com/danaroemling/mapping/main/r_ladies_april23/MAPPING_DIALECT.csv",
header = TRUE, sep = ",")
Note that we have a data set which contains both the linguistic information (here the counts and proportion) and the geolocation information. With this, we can map the data using the cities.
First, we again use our coordinates to create the basic map of the GSA, just as we did before. Only in the geom_point layer do we add the city data.
gsa +
theme_minimal() +
geom_point(data = gsa_data, # here we add the cities to our map
aes(x = Long, y = Lat, col = Proportion, size = (Count1+Count2)),
alpha = 0.9) +
guides(size = "none") +
scale_color_gradient(low = "seagreen3", high = "mediumpurple3") +
ggtitle("Schau vs Guck in the GSA") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(hjust = 0.5))
What this map shows us is the proportion of usage of the two feature in the given cities. We can see that one feature is more prevalent in the north and one in the south, so with our map we can easily visualise the distribution of this linguistic variable - much easier to understand than the table we have as gsa_data.
As the last step we want to save our map.
ggsave("german_map.png", width = 6.5, height = 5.5)